Search CORE

64 research outputs found

La minería de datos, entre la estadística y la inteligencia artificial

Author: Aluja Banet Tomàs
Publication venue: Institut d'Estadística de Catalunya
Publication date: 01/01/2001
Field of study

En la pasada década hemos asistido a la irrupción de un nuevo concepto en el mundo empresarial: el data mining (minería de datos). Algunas empresas han implementado unidades de minería de datos estrechamente vinculadas a la dirección de la empresa y en los foros empresariales las sesiones dedicadas a la minería de datos han sido las protagonistas. La minería de datos se presenta como una disciplina nueva, ligada a la Inteligencia Artificial y diferenciada de la Estadística. Por otro lado, en el mundo estadístico más académico, la minería de datos ha sido considerada en su inicio como una moda más, aparecida después de los sistemas expertos, conocida desde hacía tiempo bajo el nombre de "data fishing". ¿Es esto realmente así? En este artículo abordaremos las raíces estadísticas de la minería de datos, los problemas que trata, haremos una panorámica sobre el alcance actual de la minería de datos, presentaremos un ejemplo de su aplicación en el mundo de la audiencia de televisión y, por último, daremos una visión de futuro

UPCommons. Portal del coneixement obert de la UPC

Descripció i classificació de les comarques catalanes en regions homogènies segons l'ús de la terra

Author: Aluja Banet Tomàs
Publication venue
Publication date: 01/01/1986
Field of study

The theme of this article is the application of techniques of exploratory statistics to the study of comprehensive numerical tables consisting of statistics of a spatial nature. The immensity of statistics compiled over a large area, as in the case of a population census, frequently makes it difficult to assimilate all the information contained therein. It is shown that the mentioned techniques of analysis make possible a profound understanding of such statistics without resorting to the inspection of the said tables. The objectives usually pursued are: (1) to emphasize the most outstanding characteristics of the statistics, such as associations andlor contrasts in the elements under study, an objective which is easily fulfilled through methods of descriptive factorial analysis; (2) to group the basic elements of study into a limited number of representative classes, which can likewise be easily achieved through a simple algorithrn of ascendent hierarchical classification. The aplication of this method demonstrates the compatibility of the two results. This normally corresponds to the final stage in the study of statistical tables, in which observations relate to small areas points. The natural desire to make the classes obtained coincide with geographical regions made necessary the introduction of the content relationship within the algorithm of ascendent hierarchical classification. The application undertaken makes it possible to identify improvements in the interpretation of the classes obtained.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

The Longitudinal nature of patent value and technological usefulness exploring PLS structural equation models

Author: Aluja Banet Tomàs
Martínez Ruiz Alba
Publication venue
Publication date: 01/01/2010
Field of study

The purpose of this paper is to investigate the evolution of patent value and technological usefulness over time using longitudinal structural equation models. The variables are modeled as endogenous unobservable variables which depend on three exogenous constructs: the knowledge stock used by companies to create their inventions, the technological scope of the inventions and the international scope of protection. Two set-ups are explored. The rst longitudinal model includes time-dependent manifest variables and the second includes time-dependent unobservable variables. The structural equation models are estimated using Partial Least Squares Path Modelling. We showed that there is a trade-o between the exogenous latent variables and technological usefulness over time. This means that the former variables become less important and the latter more important as time passes.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PRESISTANT: Learning based assistant for data pre-processing

Author: Abelló Alberto
Aluja-Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue
Publication date: 02/03/2018
Field of study

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Perfil profesional del ingeniero informático: diagnóstico basado en competencias

Author: Aluja Banet Tomàs
Martínez Ruíz Alba
Sánchez Carracedo Fermín
Publication venue
Publication date: 01/01/2009
Field of study

Las universidades deben formar los ingenieros que la sociedad necesita. Los planes de Estudios del EEES deben ser diseñados, por tanto, a partir de las competencias profesionales requeridas por la sociedad. Cada escuela, no obstante, tiene su propia idiosincrasia, y debe escoger las competencias que sus egresados poseerán al finalizar los estudios y diseñar su plan de estudios a partir de dichas competencias. La selección de las competencias definirá el perfil profesional de sus titulados, por lo que es preciso disponer de elementos objetivos que permitan realizar adecuadamente esta selección. En este artículo se presenta el resultado de las encuestas realizadas a varios cientos de profesionales y a un conjunto de alumnos y profesores de la Facultat d’Informàtica de Barcelona. Las encuestas muestran el grado de importancia que los profesionales dan a cada competencia, y por lo tanto definen un perfil profesional. También muestran cómo perciben su aprendizaje los profesores y los estudiantes.Peer Reviewe

Repositorio Institucional de la Universidad de Alicante

UPCommons. Portal del coneixement obert de la UPC

Secretaría de Estado de Cultura

Disseny del Pla de Mostreig per l’estimació de la fracció de Residus Resta en la bossa tipus de Catalunya

Author: Aluja Banet Tomàs
Montero Mercadé Lídia
Publication venue
Publication date: 01/04/2009
Field of study

Informe Final de la FASE 1 del Contracte Menor de Serveis efectuat per Barcelona Ecologia a la Universitat Politècnica de CatalunyaPreprin

UPCommons. Portal del coneixement obert de la UPC

Intelligent assistance for data pre-processing

Author: Abelló Gamazo Alberto
Aluja Banet Tomàs
Bilalli Besim
Wrembel Robert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

A data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around. Typically, a dataset needs to be pre-processed before being mined. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives. As a consequence, non-experienced users become overwhelmed with pre-processing alternatives. In this paper, we show that the problem can be addressed by automating the pre-processing with the support of meta-learning. To this end, we analyzed a wide range of data pre-processing techniques and a set of classification algorithms. For each classification algorithm that we consider and a given dataset, we are able to automatically suggest the transformations that improve the quality of the results of the algorithm on the dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

On the effect of measurementmodel misspecification in PLS Path Modeling: the reflective case

Author: Aluja-Banet Tomàs
Ciampi Antonio
Lamberti Giuseppe
Minotti Simona C.
Publication venue: 'University Library/University of Twente'
Publication date: 01/01/2015
Field of study

The specification of a measurement model as reflective or formative is the object of a lively debate. Part of the existing literature focuses on measurement model misspecification. This means that a true model is assumed and the impact on the path coefficients of using a wrong model is investigated. The majority of these studies is restricted to Structural Equation Modeling (SEM). Regarding PLS-Path Modeling (PLS-PM), a few authors have carried out simulation studies to investigate the robustness of the estimates, but their focus is the comparison with SEM. The present paper discusses the misspecification problem in the PLSPM context from a novel perspective. First, a real application on Alumni Satisfaction will be used to verify whether different assumptions for the measurements models influence the results. Second, the results of a Monte-Carlo simulation study, in the reflective case, will help to bring some clarity on a complex problem that has not been sufficiently studied yet

idUS. Depósito de Investigación Universidad de Sevilla

On the predictive power of meta-features in OpenML

Author: Abelló Gamazo Alberto
Aluja Banet Tomàs
Bilalli Besim
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

The demand for performing data analysis is steadily rising. As a consequence, people of different profiles (i.e., non-experienced users) have started to analyze their data. However, this is challenging for them. A key step that poses difficulties and determines the success of the analysis is data mining (model/algorithm selection problem). Meta-learning is a technique used for assisting non-expert users in this step. The effectiveness of meta-learning is, however, largely dependent on the description/characterization of datasets (i.e., meta-features used for meta-learning). There is a need for improving the effectiveness of meta-learning by identifying and designing more predictive meta-features. In this work, we use a method from exploratory factor analysis to study the predictive power of different meta-features collected in OpenML, which is a collaborative machine learning platform that is designed to store and organize meta-data about datasets, data mining algorithms, models and their evaluations. We first use the method to extract latent features, which are abstract concepts that group together meta-features with common characteristics. Then, we study and visualize the relationship of the latent features with three different performance measures of four classification algorithms on hundreds of datasets available in OpenML, and we select the latent features with the highest predictive power. Finally, we use the selected latent features to perform meta-learning and we show that our method improves the meta-learning process. Furthermore, we design an easy to use application for retrieving different meta-data from OpenML as the biggest source of data in this domain.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteka Nauki - repozytorium artykuÅÃ³w

UPCommons. Portal del coneixement obert de la UPC

Modelling with heterogeneity

Author: Aluja Banet Tomàs
Lamberti Giuseppe
Sanchez Trujillo Gaston
Publication venue
Publication date: 01/01/2013
Field of study

We present in this paper a methodology to deal with heterogeneity in modelling when the sources are unknown. Although the approach is general we present it for the PLS-PM latent variable modelling. We call such approach PATHMOX. The idea behind PATHMOX is to build a path models tree having a binary decision tree look-alike structure with models for different segments in each of its nodes. The split criterion consists in an F statistic for comparing structural models based on testing the equality of the path coefficients. We emphasize the rationale of such approach and its limitations. Finally we present an application to an Alumni Satisfaction survey.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC